Transcriptomic and metabolomic analysis reveals the difference between large and small flower taxa of Herba Epimedii during flavonoid accumulation

Herba Epimedii, as a traditional Chinese herb, is divided into large and small flower taxa, and can invigorate sexuality and strengthen muscles and bones. Herba Epimedii is rich in flavonoids, which largely contribute to its medicinal benefits. In our previous studies, we have found that the flavonoids content was much more in small than large flower taxa. To further identify molecular mechanisms of flavonoids metabolism in Herba Epimedii, combined metabolome and transcriptomic analyses were performed to profile leaves and flowers. Association analysis revealed that the expression of genes involved in flavonoid biosynthesis showed significant differences between small and large flower taxa. Eleven flavonols significantly increased in small compared to large flower taxa. Moreover, genes encoding O-methyltransferase played crucial roles in flavonoids metabolism by an integrated analysis. Taken together, these data highlight the breeding tendency of small flower taxa to improve the quality of Herba Epimedii.


Materials and methods
Sample collection. Samples were collected from Chognqing, Hunan Province, Sichuan Province, Guizhou Province, China, and identified as large or small flower taxa of Herba Epimedii. Flowers and leaves were collected, immediately immersed in liquid nitrogen and stored at − 80 °C prior to transcriptomic and metabolomic analyses. Flowers derived from six cultivars were used for transcriptomic and metabolomic analysis, including three large and three small flower taxa cultivars. Leaves derived from four cultivars were used for transcriptomic and metabolomic analysis, including two large and two small flower taxa cultivars. Details of all samples are shown in Supplementary Table S4.
Sample extraction and metabolome analysis. Flowers and leaves were freeze-dried and crushed for 1.5 min at 30 Hz using a mixer mill (MM400, Retsch, Germany). Subsequently, 100 mg powder was weighed and extracted overnight at 4 °C in 1.0 mL of 70% methanol. During this period, samples were vortexed (10 s, 40 Hz) once every 10 min on a total of three occasions. Following centrifugation at 10,000 g for 10 min at 4 °C, extracts were filtered through a 0.22 µm microporous membrane and stored in sample vials. The QC was prepared by mixing all samples. To examine the reproducibility of the analysis process, a QC sample was injected after five test samples during instrumental analysis.
Sample extracts were analyzed using a UPLC-ESI-MS/MS system (HPLC, Shim-pack UFLC SHIMADZU CBM30A system; MS, Applied Biosystems 4500 Q TRAP). UPLC separation was completed on a Waters Acquity UPLC HSS T3 C18 column (1.8 µm, 2.1 mm × 100 mm) (Waters Corp., Milford, MA, USA). The solvent system consisted of water containing 0.04% acetic acid and acetonitrile containing 0.04% acetic acid. The gradient program was as follows: 100:0 V/V at 0 min, 5:95 V/V at 11.0 min, 5:95 V/V at 12.0 min, 95:5 V/V at 12.1 min, 95:5 V/V at 15.0 min. The flow rate was 0.4 mL/min and the injection volume was 5 µL. The column temperature was set to 40 °C. Effluents were alternatively connected to an ESI-triple quadrupole-linear ion trap (Q TRAP)-MS. Linear ion trap (LIT) and triple quadrupole (QQQ) scans were performed using QTRAP. ESI source operation parameters were as follows: ESI temperature: 550 •C; mass spectrometry voltage: 5500 V, ion source gas I (GSI), gas II (GSII), curtain gas (CUR): 55, 60, and 25.0 psi, respectively. QQQ scans were obtained as MRM experiments with collision gas (nitrogen) set to 5 psi. DP and CE for individual MRM transitions were optimized in the QQQ. A specific set of MRM transitions were monitored during each period according to the metabolites eluted within this period.
Based on public databases of metabolites and the MVDB V2.0 database of Wuhan Metware Biotechnology Co., Ltd. (Wuhan, China), qualitative analysis of primary and secondary mass spectrometry data were performed by referencing existing mass spectrometry databases such as MassBank, KNAPSAcK, HMDB and METLIN. Structural analysis of the metabolites was therefore determined. Regarding the quantitative analysis of the metabolites, MRM was used and PCA and OPLS-DA were performed to identify differential metabolites. We divided cultivars into large and small flower taxa group, and then conducted a comparative analysis of large and small flower taxa group. A VIP > 1 and |Log2 (small flower taxa group/large flower taxa group) |≥ 1 were set for metabolites with significant differences. RNA isolation and transcriptomics. Total  . Raw data was filtered through the removal of reads with adapter sequences and low-quality reads using fastp software [50]. After filtering, clean reads were assembled to obtain unigenes. Unigene functions were annotated based on NCBI non-redundant protein sequences (Nr, https:// ftp. ncbi. nlm. nih. gov/ blast/ db/ FASTA/), KEGG (https:// www. genome. jp/ kegg), Clusters of Orthologous Groups of proteins (COG, https:// www. ncbi. nlm. nih. gov/ COG/), GO (https:// www. geneo ntolo gy. org), the Swiss-Prot (http:// www. ebi. ac. uk/ unipr ot/) and the translation of EMBL (TrEMBL) databases. Similarly, we conducted a comparative analysis of large and small flower taxa group. Differentially expressed genes (DEGs) were screened using the following criteria: fold change (Log2) > 1 and P < 0.05. DEGs were subjected to Gene Ontology (GO) (http:// geneontology.org/) and Kyoto Encyclopedia of Genes and Genomes (KEGG) (http:// www. genome. jp/ kegg/) enrichment analyses to determine their biological functions and metabolic pathways [21][22][23] .

Integrative metabolomic and transcriptomic analysis. Transcriptomic and metabolomic data for
Herba Epimedii flowers and leaves were analyzed. Metabolite content of the different flower taxon and gene expression values in the transcriptomic datasets were analyzed. DEGs in flavonoid biosynthesis and differentially abundant flavonoids in each comparison group (three replicates each) were subjected to correlation analysis. Spearman's correlation coefficients were used to determine differences between flavonoid biosynthesis genes and transcription factor-encoded genes through the selection of genes with correlation coefficients ≥ 0.90 and correlation p-values ≤ 0.05. O-methyltransferase encoding genes related to flavonoid biosynthesis were selected for analysis. . Sample purity was confirmed as ≥ 98%. Standard samples were dissolved in methanol to obtain 1 µg/mL standard working solutions. Samples were stored at − 20 °C and used within one month.
Sample preparation (~ 100 mg) was performed as described. Extracts of sample solutions were passed through a 0.22 µm syringe nylon filter and stored at − 20 °C prior to analysis.
An Applied Biosystems TripleTOF4600 MS/MS spectrometer equipped with a version of 1.6 Analyst software (AB SCIEX, Massachusetts, USA) was used for analysis. Mass spectrometry was performed on an electrospray ion source using the following parameters: positive ion mode (ESI +); positive ion scanning and IDA (information association acquisition); ion source spray voltage, 5500 V; Ion source temperature, 500; Curtain gas (CUR), 164 kPa; Atomized gas (GS1), 355 kPa; Auxiliary gas (GS2), 355 kPa; Cluster cracking voltage, 55 V; Collision energy, 40 V; Collision energy rolling range 15 V. Scanning range m/z 50-1000 compounds with monitoring response values over 1000 cps were set to obtain daughter ion information.
qRT-PCR analysis. Six unigenes related to flavonoid biosynthesis were selected for validation using qRT-PCR. RNA was reverse-transcribed to cDNA using PrimeScript TM RT reagent Kits (TaKaRa, Dalian, China). qRT-PCRs were performed using the SYBRfiPremix ExTaq TM (TaKaRa, Dalian, China) on a Stepone Real-Time PCR System (ABI, USA). qRT-PCR specific primers were designed using Beacon Designer 8.0 with sequences listed in Table S5. Thermocycling conditions were follows: 95 °C for 5 min; 40 cycles of 30 s at 95 °C, 30 s at 59 °C, and 30 s at 72 °C. The dissolution curve program was used to determine reaction specificity as follows: 60 °C for 15 s followed by 95 °C for 15 s. Relative gene expression was calculated according to the 2 −ΔΔCT method. Three biological replicates were analyzed for each gene assayed 24 . A total of 182 and 143 metabolites with known structures were selected between large and small flower taxa under quality validation in the flowers and leaves (three biological replicates of each). Detailed information on the identified metabolites, including compounds, classes, molecular weights, ionization models, Kyoto encyclopedia of genes and genomes (KEGG) pathways are shown in Supplementary Table S1. In flowers, flavonols (48.35%), flavonoids (26.37%), Flavanols (6.04%) and dihydroflavone (3.85%) accounted for a large proportion of these 182 metabolites (Fig. 3A). In leaves, flavonols (39.16%), flavonoids (27.97%), isoflavones (7.69%) anthocyanins (6.29%) and flavanols (5.59%) accounted for the main proportion of the 143 metabolites (Fig. 3C).

Identification of differentially accumulated metabolites in flavonoids.
Differentially accumulated metabolites (DAMs) were defined as those exhibiting a fold change ≥ 2 or ≤ 0.5 and a variable importance of projection (VIP) ≥ 1. To further understand the function of flavonoid DAMs and their related biological processes, pathway enrichment analysis was performed using KEGG. Interestingly, all flavonoid pathways were not significantly enriched in flowers and leaves, whilst DAMs were specifically enriched in flavonoid pathways in large and small flower taxa (Fig. 3B,D). These data further indicated that flavonoid metabolism in Herba Epimedii was significantly influenced by the flower taxa.
A total of 50 and 39 DAMs involved in flavonoid metabolism were identified in flowers and leaves, respectively. In flowers, 43 DAMs were upregulated and 7 were downregulated. Flavonal DAMs accounted for the  Genes with |log2 (fold change)|> 1 and p < 0.05 were defined as differentially expressed genes (DEGs). To further understand the functions of the DEGs and their related biological processes, gene ontology (GO) and KEGG analyses were performed. GO analysis classified the DEGs into three categories: "molecular functions", "cellular components", and "biological processes" with a total of 226 and 356 GO terms in flowers and leaves. In flowers, enriched GO terms included "xyloglucan: xyloglucosyl transferase activity", "anthocyanidin 3-O-glucosyltransferase activity", "ferric iron binding" and "ribonuclease T2 activity" within molecular function, "nuclear nucleosome", "lipid droplet", "primary cell wall" and "central vacuole" within cellular component and "xyloglucan metabolic process", "hemicellulose metabolic process", and "fatty acid derivative metabolic process" within biological process (Supplementary Table S3). In leaves, "exopeptidase activity", "oxidoreductase activity, acting on paired donors, with incorporation or reduction of molecular oxygen, NAD(P)H as one donor, and incorporation of one atom of oxygen", "carboxylic ester hydrolase activity" and "O-methyltransferase activity" within molecular function, "plant-type vacuole", "plant-type vacuole membrane" and "mRNA cap binding complex" within cellular components and "lipid catabolic processes", "pigment biosynthetic processes" and "auxin metabolic processes" within biological processes (Supplementary Table S3). Pathway enrichment analysis of the DEGs using KEGG www.nature.com/scientificreports/ identified significantly enriched "metabolic pathways" and "biosynthesis of secondary metabolites" in flowers and leaves (Fig. 4A,B). These transcriptomic analyses confirmed that the flower taxa significantly influenced the metabolic pathways of Herba Epimedii.
Association analysis of the DAMs and DEGs. Correlation analysis was next performed on the DAMs and DEGs. Variations in the metabolites and their corresponding genes (Pearson correlation coefficient ≥ 0.8) were selected to construct nine quadrant diagrams and correlation coefficient cluster heat maps. In flowers, larger numbers of DAMs and DEGs were present in the first and third quadrants, with a negative correlation in the first quadrant and a positive correlation in the third quadrant (Fig. 5A). In leaves, a higher number of DAMs and DEGs were present in the seventh and ninth quadrants, which showed a positive correlation in the seventh quadrant and a negative correlation in the ninth quadrant (Fig. 5B). To understand the regulatory network of flavonoid biosynthesis in Herba Epimedii, a network analysis of DEGs encoding biosynthetic enzymes and DAMs was conducted in flower (Fig. 6A) and leaves (Fig. 6B). Based on the result of DEGs and DAMs, which were both enriched in flavonoid biosynthesis pathway, 22 structural genes showed higher correlation coefficient values (R > 0.9) with 10 flavonoids in flower, while 9 structural genes showed higher correlation coefficient values (R > 0.9) with 12 flavonoids in leaves (Supplementary Table S6 Figure 7 shows the TIC diagrams of 16 standards in the positive ion mode. Method validation including linearity, limit of detection (LOD), limit of quantification (LOQ), stability, precision, and reproducibility are shown in Table 2, with an R 2 higher than 0.99916, relative standard deviations (RSDs) of the stability lower than 1.5%, RSDs of the precision ranging from 0.34 to 1.52%, and RSDs of repeatability in the range of 0.74-1.76%. These data highlight the reliability of the method. Flavonoid accumulation in the aerial regions of Herba Epimedii showed significant differences between large and small flower taxa. Compared with the four large flower taxa cultivars (S1, S2, S3, S4), remarkably higher levels of Icariin, Icariside I, Icariside II, Epimedin A, Epimedin B, Epimedin C, Sagittatoside A, Sagittatoside B, 2''-O-rhamnosyl icariside II, Epimedoside A, Baohuoside V and Acuminatin accumulated in the four small flower taxa cultivars (S5, S6, S7, S8). Baohuoside II was absent in the small flower taxa excluding S3. Interestingly, the content of Desmethylicaritin was higher than the other seven cultivars. The content of Anhydroicaritin in S7 was minimal in the small flower taxa cultivars. The content of Maohuoside A showed no significant differences between large and small flower taxa. Detailed data are shown in Fig. 8

Discussion
Previous studies have analyzed critical metabolic genes in medicinal plants using metabolic and transcriptomic analysis of different organs 9,11-15 . Flavonoids are a major source of the pharmacological activities of medicinal plants and regulate several important biological processes. A total of 10 transcription factor-encoding genes and 14 flavonoid biosynthesis genes were identified in A. roxburghii 25 . However, very little is known about flavonoid composition and the molecular mechanism of flavonoid biosynthesis in Herba Epimedii. Previous studies showed that the total flavonoid content of small flower taxa group was much higher than that of large flower taxa group using ultra performance liquid chromatography/tandem mass spectrometry (UPLC-MS/MS) such as Epimedium A, Epimedium B and Epimedium C (Fig. 8). Preliminary identification of the candidate genes involved in flavonoid accumulation were performed through comparison of the groups and organs. Transcriptome libraries of the flowers and leaves of different groups of Epimedium were constructed using high-throughput sequencing technologies to identify active pharmaceutical compounds and their biosynthetic mechanisms. Flowers and leaf flavonoid metabolites in Epimedium were identified by UPLC-MS/MS. Once sequences were assembled, 36, 293 and 18, 505 differentially regulated genes were identified in the flowers and leaves, respectively. We identified 143 flavonoid metabolites in Epimedium using metabolomics. These transcriptomic and metabolomic data provide an important basis for the further exploration of metabolic pathways in Epimedium. Flavonoids are considered one of the major pharmaceutical components of Epimedium. According to KEGG analysis, flavonoid metabolites that differentially accumulate in the different organs of Epimedium include flavonols, flavonoid, flavanols, dihydroflavone, isoflavones, dihydroflavonol, flavonoid carbonoside anthocyanins, tannin, proanthocyanidins, chalcones and sinensetin biosynthetic pathways, which were significantly enriched. This is important information required to elucidate the molecular regulatory mechanisms underlying flavonoid accumulation.
Metabolomic analysis of the flowers revealed 43 flavonoid metabolites with greater levels in the small flower group, including Epimedin A, Epimedin B, Epimedin C, sagittatoside A, magittatoside B, maohuoside A, maohuoside B, icariside I, icariside II and icariin. Moreover, 10 flavonoid metabolites with a higher content in small Figure 5. Quadrant diagrams representing the association of the DAMs and DEGs be-tween large flower taxa and small flower taxa in Herba Epimedii. The x-axis represents that the log2 ratio of gene and the y-axis represents the log2 ratio of metabolite; black dot-ted lines represent the different threshold; each point represents a gene or metabolite; black dots represent the unchanged genes or metabolites; green dots represent differentially ac-cumulated metabolites with unchanged genes; red dots represent differentially expressed genes with unchanged metabolites; blue dots represent both differentially expressed genes and differentially accumulated metabolites; it is divided into ①-⑨ quadrants from left to right and from top to bottom with black dotted lines; the ①, ②and ④ quadrants indicate that the expression abundance of metabolites is higher than that of genes; the ③ and ⑦ quadrants indicate that the expression patterns of genes are consistent with the metabolites; the ⑤ quadrant indicates that the genes and metabolites are not differentially expressed; the ⑥, ⑧ and ⑨ quadrants indicate that the expression abundance of metabolites is lower than that of genes. In recent years, icariin has been shown to exhibit numerous pharmacological activities including anti-oxidant, anti-inflammatory, and anti-apoptotic effects. These may contribute to the described preventive and/or therapeutic benefits of icariin in various disorders in the nervous system, including cerebral ischemia, Alzheimer's disease (AD), Parkinson's disease (PD), multiple sclerosis (MS), and depression [26][27][28][29] . Several flavonoids including epimedin A, epimedin B and epimedin C were isolated from Herba Epimedii, and showed significant effects on neurite outgrowth in cultured rat pheochromocytoma (PC12h) cells 30 .
In general, the leaves of Herba Epimedii are known to possess medicinal benefits. Nonetheless, analysis of the metabolites in the leaves of the two groups demonstrated that metabolites involved in flavonoid biosynthesis exhibit differential accumulation patterns, suggesting that suitable manipulation of the different flower groups can lead to improved flavonoid use and efficacy. These findings contribute to the evaluation of medicinal plants and increase our understanding of the applications of different Herba Epimedii groups.    Flavonoids are a class of 2-phenylchroman (flavan) derivatives with hydroxyl, methoxy, or hydroxy isoprene as substituents. They are methylated or glycosylated on the C4, C3 and C7 hydroxyl of the parent nucleus. This study focused on the regulatory networks of methylation in Epimedii and compared differences in O-methyltransferase  www.nature.com/scientificreports/ expression between the leaves and flowers of the two groups. According to the results, 129 differentially expressed O-methyltransferases were observed between the leaves and flowers of large and small groups, which could be classified into 21 families. Differential expression of these O-methyltransferases is of particular importance to our understanding of the mechanisms underlying the metabolic differences between small and large flowers of Herba Epimedii.

Conclusions
In our study, we found that the total flavonoids content was remarkably more in small flower taxa than large flower taxa of Herba Epimedii. The transcriptome and metabolomics responses of Herba Epimedii were evaluated between large flower taxa and small flower taxa, it provides much needed information on the factors that dictate flavonoid accumulation and its related genes. Small flower groups possess higher levels of flavonoids due to the